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Abstract 



Asymptotic equivalence in Le Cam's sense for nonparametric regression experiments is 
extended to the case of non-regular error densities, which have jump discontinuities at their 
endpoints. We prove asymptotic equivalence of such regression models and the observation 
of two independent Poisson point processes which contain the target curve as the support 
boundary of its intensity function. The intensity of the point processes is of order of the 
sample size n and involves the jump sizes as well as the design density. The statistical 
model significantly differs from regression problems with Gaussian or regular errors, which 
are known to be asymptotically equivalent to Gaussian white noise models. 
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1. Introduction 

The goal of transforming nonparametric regression models into asymptotically 
equivalent statistical experiments, which describe continuous observations of a sto- 
chastic process, has stimulated considerable research activity in mathematical statis- 
tics. The continuous design in these limiting models simplifies the asymptotic analy- 
sis and makes statistical procedures more transparent because in the regression case 
the discrete design points generate distracting approximation errors. Most papers 
so far establish asymptotic equivalence of certain nonparametric regression models 
with nonparametric Gaussian shift experiments. In that Gaussian white noise ex- 
periment, a process is observed which contains the target function in its drift and 
a blurring Wiener process which is scaled with a factor of order n~^/^, where n de- 
notes the original sample size. The basic equivalence result for standard Gaussian 
regression with deterministic design has been established by Brown and Low (1996). 
Afterwards, many important extensions have been achieved. The case of random 
design for univariate design has been treated by Brown et al. (2002). Carter (2007) 
considers the case of unknown error variance and design density; and Reifi (2008) 
extends the results to the multivariate setting. Recently, the model with dependent 
regression errors has been investigated in Carter (2009). The work by Grama and 
Nussbaum (1998) is the first to consider the important case of non-Gaussian errors 
which are, however, supposed to be included in an exponential family. Such classes of 
error distributions are also studied in Brown et al. (2010) where the regression error 
is supposed to be non-additive. General regular distributions for the additive error 
variables are covered in Grama and Nussbaum (2002) where only slightly more than 
standard Hellinger differentiablity is required for the error density. 

On the other hand, when allowing for jump discontinuities of the error density, 
the situation changes completely. Standard examples include uniform or exponential 
error densities. These types of error distributions are non-regular and we know from 
parametric theory that better rates of convergence and non-Gaussian limit distribu- 
tions can be expected. The faster convergence rates are attained only by specific 
estimators, e.g. employing extreme value statistics in their construction instead of 
local averaging statistics. The Nadaraj a- Watson estimator and the local polynomial 
estimators are procedures of that latter type, which can be improved significantly 
under non-regular errors. Miiller and Wefelmeyer (2010) establish improved mini- 
max rates for regression functions which satisfy some Holder condition. Hall and 
van Keilegom (2009) derive a rigorous theory for the optimal convergence rates for 
nonparametric regression under non-regular errors and smoothness constraints up to 
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regularity one on the target regression function. Their nonparametric minimax rates 
in dimension one are of the form n^^/^^+^^ for Holder regularity s, which is faster 
than the usual n~^/*-^*'^"'^^-rate for regular regression, but slower than 77,-23/(25+1)^ ^^^e 
squared regular rate in analogy with the parametric rates. At first sight, this is 
counter-intuitive, but may be explained by a Poisson instead of Gaussian limiting 
law. Many applications of non-regular regression models occur in the field of econo- 
metrics, see Chernozhukov and Hong (2004) for an overview and a precise asymptotic 
investigation of the parametric likelihood ratio process. Irregular regression problems 
are also closely related to nonparametric boundary estimation in image reconstruc- 
tion, see the monograph of Korostelev and Tsybakov (1993). Considerable interest 
has also found the problem of frontier estimation, see Gijbels et al. (1999) and the 
references therein. 

In Janssen and Marohn (1994) weak asymptotic equivalence of the extreme or- 
der statistics of a one-dimensional localization problem with non-regular errors and 
a Poisson point process model is derived in a parametric setup. Also for the pre- 
cise asymptotic analysis of regression experiments with non-regular errors the use of 
Poisson point processes and random measures turn out to be useful, see e.g. Knight 
(2001) for parametric hnear models and Chernozhukov and Hong (2004) for general 
parametric regression, yet a precise and nonparametric statement lacks. We intend 
to fill this gap by rigorously proving asymptotic equivalence of nonparametric regres- 
sion experiments with non-regular errors with a Poisson point process (PPP) model. 
Therein the target parameter occurs as the boundary curve of the intensity function. 
Hence, the Gaussian structure of the process experiment is not kept; nor is the scaling 
factor which will be changed into in agreement with the parametric rate. 

For a comprehensive review on PPP and their statistical inference we refer to Karr 
(1991) and Kutoyants (1998). They discuss image reconstruction from laser radar 
as a practical application of support estimation of the intensity function of a PPP, 
which corresponds to identifying the target parameter in our PPP experiment. The 
asymptotic equivalence result therefore links interesting inference questions in both 
models which might prove useful in both directions. 

For the basic concept of asymptotic equivalence of statistical experiments we refer 
to Le Cam (1964) and Le Cam and Yang (2000). To grasp the impact let us just 
mention that asymptotic equivalence between two sequences of statistical models 
transfers asymptotical risk bounds for any inference problem from one model to the 
other, at least for bounded loss functions. Moreover, asymptotic equivalence remains 
valid for the sub-experiments obtained by restricting the parameter class so that we 
shall also cover smoother nonparametric or just parametric regression problems. 



The paper is organized as follows. In Section |2] we introduce our models, state 



our main result in Theorem |2 . 1 1 and give a constructive description of the equivalence 
maps. In Section |3] we construct pilot estimators of the target functions which will 
be employed to localize the model in Section |4] and |6| The findings of Section |5] 
yield asymptotic equivalence of the PPP experiment and the regression model when 
the target functions are changed into approximating step functions. In Section [7] all 



the results are combined to complete the proof of Theorem 2A_ Section [8] discusses 



limitations and extensions of the results and gives a geometric explanation of the 
unexpected nonparametric minimax rate for Holder classes. 

2. Model and main result 

In this section we specify the statistical experiments under consideration. First we 
define the joint parameter space G of both the regression and the PPP experiment, 
imposing standard smoothness constraints on the target function. 

Definition 2.1. For some constants Cq > and a G (0,1] the parameter set G 
consists of all functions -(9 : [0, 1] — )■ M which are twice continuously differentiable on 
[0, 1] with ll-f^lloo < Ce and ||'i9"||oo < C'e and where the second derivative satisfies the 
Holder condition 

\r{x)-r{y)\ <Ce\x-yr, Vx,ye[0,l]. 

In the regression model G represents the collection of all admitted regression 
functions. This parameter space will remain unchanged for all experiments considered 
here. 

Definition 2.2. We define the statistical experiment An in which the data Yj^n, 
j = 1, . . . ,n, with 

are observed. The deterministic design points . . . ,Xn,n £ [0, 1] are assumed to 
satisfy 

^j,n = F^\{j-l)/{n-l)), (2.2) 
where the distribution function Fd : [0, 1] — > [0, 1] possesses a Lipschitz continuous 
Lebesgue density fo which is uniformly bounded away from zero. The regression 
errors ej^n are assumed to be i.i.d. with error density fs '■ [0, 1] — > M^, which is 
Lipschitz continuous and strictly positive. 

The conditions on the design are adopted from Brown and Low (1996). They 
imply that 

d^-^/n < Xj+i^n — Xj^n < d/n, (2.3) 
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for all n G N, j = 1, . . . , n and a finite positive constant d. 

The error model describes the class of densities which are supported on [—1,1], 
regular within (—1,1) and which have jumps at their left and right endpoints. Note 
that by constant extrapolation the density fe on [—1, 1] can always be written as 

fe{x) = l[-l,l](x) ■ ip{x) , 

with a strictly positive Lipschitz continuous function : M — )■ M satisfying for some 
constant > 

supM^iz_»M + ,„p|^(«)|<C.. (2.4) 

Instead of constant extrapolation, (p may alternatively be continued such that ip G 
Li(M) holds in addition. 

Hence, experiment An describes a non-regular nonparametric regression model. 
We believe that the regularity condition on in the interior (—1,1) can be sub- 
stantially relaxed, but at the cost of more involved estimation techniques. We have 
restricted our consideration to the specific interval [—1, 1] for convenience. 

In the PPP model the target function i) occurs as upper and lower boundary 
curves of the intensity functions of two independent Poisson point processes Xi and 

Definition 2.3. For functions "i? G G, the design density fo and the noise density fs 
from above we define the experiment in which we observe two independent Poisson 
point processes Xj, j = 1,2, on the rectangle S = [0, 1] x [— Ce — 1, Ce + 1] C 
with respective intensity functions 

Mx,y) = fnix) ■ l[^(x),Ce+i]{y) ■ nfs{-l) , (2.5) 

for all (x, y) G S. 

Each realisation Xj represents a measure mapping from the Borel subsets of S 
to N U {0}. Equivalently, Xj{-)/Xj{S) may be characterized by a two-dimensional 
discrete probability distribution, see Karr (1991) or Kutoyants (1998) for more details 
on PPP. Thus, the underlying action space can be taken as a Polish space (e.g. the 
separable Banach space Li{S)) such that asymptotic equivalence can be established 
by Markov kernels. 

Figure [l] shows on the left the regression function {}{x) = ^xcos(lOx) and cor- 
responding n = 100 equidistant observations of An corrupted by uniform noise on 
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Figure 1. Left: Regression model An with uniform f/[— 1, 1] errors. 
Right: Equivalent Poisson point process model i3„ 

[—1,1]. A realisation of the equivalent PPP model Bn is shown on the right, with 
'+', '-' indicating point masses of X2 and Xi, respectively. 

We may conceive Xj as the random point measure Y2k=i \x{ y^) where Nj is drawn 
from a Poisson-distribution with intensity ||Aj||ii(5) and the {xl,yl) are drawn ac- 
cording to the bivariate density Aj/|| Aj ||li(5). The vertical bounds ±(C7e + 1) for the 
domain S are non-informative for ?9 G O, but the boundedness avoids technicalities. 
The equivalent unbounded PPP can be described by infinite random point measures 
YlkLi ^{x^ y^) where the are drawn according to the density fa and 

yl = ^(4) - (nfsii))-' Eti yl = ^(4) + (nfsi-i))-' Eti 

holds with exponentially distributed {zD of mean one (all independent). In this form, 
the PPP already appears in Knight (2001), yielding the limiting law for parametric 
estimators in the nonregular linear model. 

We present the main result of this work in the following theorem. 

Theorem 2.1. The statistical experiments An and Bn are asymptotically equivalent 
in Le Cam's sense as n 00. 

This asymptotic equivalence is achieved constructively by consecutive invertible 
(in law) and parameter-independent mappings of the data, which generate new exper- 
iments where the observation laws are shown to be asymptotically close (uniformly 
over in total variation norm). In order to highlight the main ideas in the subse- 
quent proof and to indicate how to use our theoretical result in practice, let us give 
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an algorithmic description of these equivalence mappings leading from experiment 
An to experiment Bn (in the version with unbounded domain). 

(1) Take the data Yj^n, j = ^, ■ ■ ■ j^, from experiment An- 

(2) Split the data and bin one part: consider the odd indices J„ {1,3,..., 
2\n/2] — 1} and intervals Ik = [k/m, {k + l)/m) with some appropriate m. 
Put Xi = {Yj+i^ri)jeJr,\{n} and Z = {Zj)j^j^ with 

Zj = yj,n - MCj) - - 0), j e J„, 

where is the centre of that interval Ik with Xj^n £ h and where i)i is a 
(good) estimator of based on the data Xi. 

(3) Consider the local extremes in Z, i.e. Sk — m.m{Zk), Sk — max(Zfe), k — 
0, . . . , m — 1. 

(4) Use d on the data Xi again to transform s'l — Sk + "^liCk) + 1) S'^ = Sk + 

M^k) - 1. 

(5) Randomization to build PPP Xi, X„: on each interval Ik generate (a;^,?/^) 
with x[ having the density fk = fo^ik/ jj^ fo independent of everything else 
and y\ = S'l — d'-^{x^f^){^k — x^k)'i define the PPP Xi where independently on each 
Ik we observe a point measure in (a;^^,7/[) plus independently (conditionally 
on S'l^ 'd']) a PPP with intensity 

f/,(l)(m4 fD)l{x elk,y< S'l - ^',{x){ik - x)}- 

analogously generate with the density fk independently, y'^ = s'l—'d\{x'^){^k~ 
x^) and use the intensity 

|A(-l)(m 4 fD)l{x elk,y> si - ^',{x){ik - x)} 

to build Xu independently conditionally on s'l, t9[. 

(6) Use a (good) estimator ^2 based on the PPP data X2 = {Xi,Xu) and redo 
steps (2)- (5) to transform Xi via Zj+i = l^+i,n — '^2{^j+i) — ~ 

j G Jnj to another couple (X/,X^) of PPP; the final PPP are obtained 
hyX^ = Xi + XiX2 = X^ + X'^. 

In this algorithmic description we could do without substracting and adding the 
pilot estimator itself (i.e., only use the derivative) in steps (2) and (4), but in the 
proof this localization permits an easy sufficiency argument for the local extremes. 
Put in a nutshell, the asymptotic equivalence is achieved by considering block-wise 
extreme values in the regression experiment, in conjunction with a pre- and post- 
processing procedure (localization step) performing a linear correction on each block. 
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The easier block-wise constant approximation approach by Brown and Low (1996) 
does not work here since we need a much higher approximation order. 

Throughout we shall write const, for a generic positive constant which may change 
its value from line to line and does not depend on the parameter ^9 nor on the sample 
size n. Similarly, the Landau symbols O, o and the asymptotic order symbol x will 
denote uniform bounds with respect to d and n. 

3. Pilot estimators 



In order to prove Theorem |2.1| a localization strategy is required as in Nussbaum 
(1996) for the density estimation problem. To that end we construct pilot estimators 
of the target function -d and its derivative in both, experiments An and Bn- 

Let us fix the estimation point Xq G [0, 1] and apply a local polynomial estimation 
approach. We introduce the neighbourhood Uh = [xq — h,xo + h] for xq G [h,l — h] 
and the one-sided analogue Uh = [0,2h] for xq G [0,h), Uh = [I — 2h,l] for xq G 
{1—h, 1]. We introduce the set n := Il2{Uh) of quadratic polynomials on Uh- Standard 
approximation theory (by a Taylor series argument) gives for h 10 

7/1 := supminmax (h~^'^~^°'^\'d{x) — p{x) \ + h^^^^°'^\i!}'{x) — p'{x)\) < const. < oo , 
^ge pen xGUh 

where the constant does not depend on h. 

Definition 3.1. We call -i? G 11 in experiment An locally admissible at Xq if 

max „ - d{xj,n) I < 1 + 7h/i'+" 

holds. Similarly, in experiment i3„ we call ?9 G 11 locally admissible at Xq if 

Xii{x eUh,y> ^(x) + 7h/^'+"}) = and X2({x E Uh, y < ^(x) - 7/^/^'+"}) = 

hold. Our estimator 'dn,h{xo) is just any locally admissible {}n,h G 11, evaluated at 
Xq and selected as a measurable function of the data (by the measurable selection 
theorem) . 

Note that the by jh enlarged band size guarantees that ^n,h exists since the mini- 
mizer "i^/i G 11 in the definition of 7/1 is eligible. The following result gives the pointwise 
risk bounds for the regression function and its derivative with orders 0(n~*/'-*+^^) and 
0{n~^'^~^^^^^~^^^), respectively, where s = 2 + a denotes the regularity in a Holder class. 



As an application of our asymptotic equivalence we shall show in Section 8^ below 
the optimality of these rates in a minimax sense. The upper bound proof relies on en- 
tropy arguments and norm equivalences for polynomials and could be easily extended 
to more general local polynomial estimation and L^-loss functions. 



Proposition 3.1. Select the bandwidth h such that h^n Then we have in 

experiment An as well as in experiment Bn 

sup sup E^(n2(2+")/(3+°)|^^^^(a;(,)_^(a;Q)|2+^2(l+a)/(3+a)|^/^^^^^^^_^/(^^^ < ^^^^^^ 
)?G0 i'oG[0,l] 



Proof of Proposition 3.1: We shall need the following bounds in 11 = Il2{Uh) from 
DeVore and Lorentz (1993): ||p||ioo((7^j < 8h^^\\p\\ii(^ij^-j (their Theorem IV.2.6); 
lb'IU°°(;7h) — ^oh~^\\p\\L°°{Uh) (their Thm. IV.2.7); their proof of Thm. IV.2.6 es- 
tablishes \p{x)\ > (1 — 4(x — XM)/^)|b||oo for Xm := argmax3,g^^|]9(x)| and Xm <x< 
xm + h/i, assuming without loss of generality that xm lies in the left half of Uh, such 
that uniformly over xq 

lb||„,h,i := ^ ^ \p{xj^n)\> const. ■ \p{xm)\ = const. ■ \\p\\loo(^u^) 
is derived. 

Let us start with considering the regression experiment An- We apply a standard 
chaining argument in the finite-dimensional space 11 together with an approximation 
argument. From above we have ||p||Loo(c/ft)/||p||n,/i,i x 1 as well as ||p||n,/i,i > ci|p(xo)| 
with some Ci > uniformly in p G 11. Fix R > 2. For every 5 > we can find 
elements {pi)i>i that form a 5-net in 11 fl > Ci max(l, Co)(-R — l)-yhh'^~^°'} 

with respect to the L°°{Uh)-noTm satisfying ||pi||„,h,i x 51^^^ as Z — )■ oo ; for this note 
that, by the above norm equivalences, 11 n {||p||n,/i,i > ci max(l, co)(-R — 1)7/1/1^"'"°} 
with maximum norm is isometric to fl {|x| > Ci max(l, Co)(-R — l)7/j/i^~*""} with 
the Euclidean metric uniformly for /i — )■ and nh 00 and use standard coverings 
of Euclidean balls, e.g. Lemma 2.5 in van de Geer (2006). We obtain 

P^ hpeU: max „ - p(x,- „) | < 1 + 7h/i'+", 

max(/i-(2+")|p(xo) -^(a;o)U-('+")b'(a;o) -^?'(xo)|) > Rlh 
= P^{3p eU : max {ej^n - (^(a^j.n) - ^^(a^j.n))! < 1 + Ihh"^ 

max(/i-(2+")|p(xo) -^(xo)|,/i-(^+")b'(xo) -^'(xo)|) > i?7v 
<Pi)(3peIl: max |£j> - (p(xj>) - ^/i(a;j,n))| < 1 + 27/^/1^ 

\\P - ^h\\n,h,i > max(l, Co)ci(i? - l)7,,/i^+" 
<PJ31>1: max , - p,(x,, „) | < 1 + 2^hh^+'' + s] 

< y f max \ej,n - Pi {xj,n) I < 1 + 2-fhh^+'' + s) . 



,2+a 
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From 1) > 0, /e(+l) > and the Lipschitz continuity of within [—1, 1] we 
infer that any Sj^n satisfies 

min (P(£j^„ > 1 — k), P{£j,n < — 1 + >^)) > ck 

for some constant c > and all k e (0, 1). We derive an exponential inequality for 
any / : [//j ^> M and A > 0: 

P{ max \ej,n - f{xj,n)\ < 1 + A) 

< H (l-min(p(£,.„>l + A-|/(x,g|),P(£,-„<-l-A + |/(x,)|))) 
<exp( J2 log(l-c(|/(a;,)|-A)+)) 
<exp(-c Yl (1/(^01 -A)+) 
<exp{-cnh{\\f\\n,h,i-^)): 
using log(l + h) <h. We therefore choose 5 = R'jhh^^"' and arrive at 

P-ff (3p e n : p is locally admissible, 

max(/i-(2+'^)|p(xo) -^(xo)|,/i-^'+")b'(xo) -^?'(xo)|) > ^7^) 

< ^ exp - const. • nh{S + 7/^/^^+")/^/^) = O exp - const. • Rnh^+°'^ ) . 
i>i 

We conclude, substituting h x that uniformly over R>2 

P^(/i-(2+")|^„^^(xo) - ^(xo)| > P7/.) = 0(exp(-const. • P)), 
P^(/i-(i+")|<,^(xo) - ^'{xo)\ > R^h) = 0(exp(-const. • P)). 

Integrating out these exponential tail bounds yields the desired moment bound in 
experiment An- 

All the results obtained so far remain valid for the PPP experiment Bn when the 
empirical norm is replaced by the rescaled Li(?7/i)-norm IbUi,;/^ := | lujs^^ 

the admissibility conditions are exchanged and the following (easier) exponential 
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inequality is used: 

P^(^X^i{x eUh,y> ^x) + fix) - A}) = 0, X2i{x e U^, y < ^x) + f{x) + A}) = o) 
= Po(^i({x e C/^, y > f{x) - A}) = o)Po(^2({2: &Uh,y< f{x) + A}) = o) 
= exp(-nA(l) /" (/(a;)-A)+/B(a;)da;) 

• exp ( - / (-/(x) - A)+/^(x) dx) 

^ JUh ' 

<exp(-c'n/i(||/||i,f;,-A)) 

with some constant c' > 0. □ 



4. Design adjustment for the regression experiment 

We use a piecewise constant approximation strategy and introduce the intervals 

h,n = [^M, (A; + l)/m), A; = 0, . . . , m - 2, and l,n-\,n = [("^ - 1] (4.1) 

for some integer m. For any design point Xj^n G -^fe.n we introduce the centre of the 
interval 

ij^n ■■= {k + l/2)/m for x^- „ e (4.2) 

Now we apply a sample splitting scheme and write J„ for the collection of odd j e 
{1, . . . ,n}. The experiment An is considered as the totality of the two independent 
data sets X = {Yj+i^n)jeJr, and Y' = (lS>)j6J„- 

Subsequently, we shall not touch upon X to establish asymptotic equivalence, 
but just assume the existence of sufficiently good estimators based on the data X. 
Therefore, we forget about the specific definition of X and write X* instead. 

Definition 4.1. Let X* be an arbitrary observation in a Polish space, which is 
independent of Y'. We generahze the experiment An to An*, which consists of the 
data Y' and X*. 

The original experiment An is still included by putting X* = X. This enables 
us to repeatedly use the following results later also when X* will denote a PPP 
observation. 

In a first step we show asymptotic equivalence for the regression experiment An* 
with the same experiment, but where for j e J„ the regression function is observed 
at the interval centres 
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Definition 4.2. In experiment C„ we observe independently tlie vectors X* as under 
experiment An* and, independently, the vector Z with the components 

Lemma 4.1. Choose m G N such that = o(ra~^/^) holds and assume that an 
estimator §' can he constructed based on the data set X* with 



sup sup E.ff\-d' {x) — {x)\ 



o[mn 



i?ee a;G[0,l] 

Then the experiments An* and Cn are asymptotically equivalent. 



Proof of Lemma 4-1' The observations Y' from the experiment An* are transformed 



into the data set Y with the components 

for all j G Jn- The data set X* is not affected by this transformation. As i)' is 
based on the data X*, this transformation is invertible so that the original data are 
uniquely reconstructable from the transformed ones; and observing (X*, Y') on the 
one hand and (X*, Y) on the other hand is equivalent. Therefore, for any measurable 
functional R with ||-R||oo < 1 we observe that 

\E^R{X.\Y) - E^R{X.* ,Z)\ < E^\E^{R{X.*,Y)\X.*} - E^{R{X.\Z)\X.*}\ (4.3) 

^ J2^^\\fy..\x*-fz,,^\x4i, (4.4) 

where denotes the Li(]R)-norm; in general, fY\x stands for the conditional density 
of Y given X. The conditional independence of the Yj^n and the Zj n given X* as 
well as an elementary telescopic sum argument with respect to the Li (]R)-distance of 
the multivariate conditional densities of Y and Z given X* have been exploited. We 
obtain by the Lipschitz continuity of ip 

ll/y,-„|x* - < 2||v3||oo • \^i,j,n\ + J + Aij-„) - ip{x)\dx < 4C^ ■ |Aij-„| 

(4.5) 

where 
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We conclude that the total variation distance between (X*, Y) and (X*, Z) is bounded 
from above by 

const. ■ ^ E^(|Aij-„|) . 
By the Holder constraints imposed on the parameter class B we derive that 

< const. ■ (m"^ + {^'{Xj^n) - i^'{xj,n)\'m~^) . 

Using = o(n~^) and the convergence rate of we conclude that the Le Cam 
distance between the experiments An* and Cn tends to zero uniformly in -(9, which 
gives the assertion of the lemma. □ 

Usually, the bound on the total variation of product measures which is used in 
the proof is suboptimal, but here the order is optimal due to the singular parts in the 
measures. Note also that the data „ may be viewed as random responses drawn 
from a regression function which is locally constant on the intervals Ik,n with the 
values ^{C,j,n) when Xj^n £ h,n- 

5. Asymptotic equivalence for step functions 



We revisit the experiment C„ from Definition 4.2, The data Zj^n may be trans- 
formed into 

where "i? denotes a preliminary estimator of ^ which is based on the data from X* as 
contained in the experiment Cn- Again this transformation is invertible so that the 
experiment C„ is equivalent to the experiment C„' under which one observes the data 
X* and the vector Z = The Zj^n, j ^ Jn, are conditionally independent 

given X* and have the conditional densities 

/e(a;-Aoj>) = <^(x-Ao,j>)l[Ao.,.„-i,Ao,,,„+i](a;) with Aoj> = H^j,n)-H^j,n)- (5.1) 

The next key step is to replace these densities by those with unshifted (f where 
local minima and maxima will turn out to be sufficient statistics. 

Definition 5.1. Let Wj^n, j ^ Jn, conditionally on X* be independent random 
variables with respective densities 

fwj{x) = ip{x)( ip{t)dt) l[Ao.,.„-l,Ao,j,„+l](a;) , jeJn, 



where Aoj,n is given in (5.1). The experiment in which X* and the j € J„, are 



observed for -(9 G O is denoted by Vn- 
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Lemma 5.1. Suppose that an estimator -t} of d can he constructed based on the data 
set X* such that 



sup sup Ei)\'d{x) -^{x)Y = 0{n''-') , (5.2) 
i?ee i'e[o,i] 

for some 5 > 0. Then the experiments Cn and P„ are asymptotically equivalent. 



Proof of Lemma 5J_: By Le Cam's inequality and the subadditivity of the squared 
Helhnger distance H for product measures (cf. Section 2.4 in Tsybakov (2009) or 
Appendix 9.1 in ReiB (2008)) we deduce that for any measurable functional R with 

||-R||oo < 1 we have 

\E^R{X*,Z) - E^R{X*,W)\ < [■■■ fsiVj - Ao,,,n) - J] fwAyj)\dy 

< 2 E^H'{fw,j,fe{- - Ao,,,n)) , (5.3) 

where the expectation is taken over Aoj,ri. Hence, it remains to be shown that the 
sum converges to zero uniformly with respect to -d E Q. That sum equals 



/-^O.j.n + i / , /./:^o,j,n + i \-l/2 / \ 

> / \Vv{x)[ (p{t)dt] - J(y9(x - Ao,j>) dx 

< 4C,^ (2 + { inf ^(x)}-i) J2^^K^,n 

\x\<l '—^ ■' 



ieJn 



since is strictly positive, continuous and satisfies the condition (2.4). The imposed 
convergence rate of the estimator d yields that the supremum taken over °d eQ tends 
to zero at the rate 0{n~^^ and the proof is complete. □ 



The conditional joint density of the W,>, J ^ ^n-, given X* from the experiment 
Vn can be represented by 

/ \ / /"'^O.j.n + l s -1 

/h^(w) = n /h^,.k) = ( n ^K)) ( n / ^^-^^ 

m— 1 

■ JJ l(min{wj : G 4,„} > Aoj(fc),„ - 1) ■ l(max{ii;j : a;^-^ e 4,^} < Ao,j(fc),„ + 1) 

fc=0 

where the are as in Section |4] and j(A;) = min{/ G J„ : G lk,n\-, w = {wj)j^j^. 
Note that the parameter ^9 is included in the term Aoj(fc),n- 



14 

Definition 5.2. In experiment £n only the data (X*, Sk,ni Sk,n), k = 0, . . . , m — 1, 
with 

Sk,n = min{Ty_,- „ : Xj-„ G Ik,n} , 

Sk,n = max{Wj^n ■ Xj^n ^ 4,n} , 

are observed for i) E Q. 



An inspection of (5.4) yields that (X*, „, S'fc^„), k = 0, ...,m — 1, provides a 
sufficient statistic for the whole empirical information contained in (X*, {Wj^n '■ j ^ 
Jn}) by the Fisher-Neyman factorization theorem. 

Sufficiency implies equivalence (e.g. Lemma 3.2 in Brown and Low (1996)) and 
we have 

Lemma 5.2. Experiments T>n and Sn are equivalent. 

In the following we study the conditional distribution of {sk,n,Sk,n) given X*. 
Note that, conditionally on X*, the {sk^n, Sk^n) are independent for k = 0, . . . ,m — 1 
as the intervals Ik^n are disjoint. We derive that 

P[Sk,n > X, Sk,n < y|X*] = P[Wj^n G (x^y], Vj G with Xj^n ^ 4,n|X*] 

fwj(k)(t)dt 



for y > X. Thus we obtain the conditional joint density of (s^^n, Sk^n) via 

fis,,„A,Ax,y) = --Q^P[^Kn > X, Sk,n < 2/|X*] 

= Ak,n{x,y) ■ lk,n{lk,n — fw,j{k){x) fw,j{k){y)'^{y>x} , 

where 

Ak,nix,y) = (1- / fw,j{k)it)dt - / fw,j(k){t)dt\ 

Definition 5.3. Consider for each k two conditionally on X* independent random 
variables s'^ „ and S*^ „ with conditional exponential densities 

fs'^^S^) = i^k,n — 2)/wj(fc)('^0,i(fc),n " 1) exp ( — (/fc_„ — 2) /^j (fc) ( Aq j(fc),n " 1) 

• (x - Aoj(fc),„ + 1)) l[Ao,,(,),„-i,^)(a;) , 

fs'^„{x) = {Ik^n - 2)/w,i(fc)(Ao,j(fc),n + 1) GXp ( - (/fc_„ - 2) fwj{k){.^0,j{k),n + 1) 

• (-X + Aoj(fe),„ + 1)) l(_oo,Ao,,(,.),„+i](a;) , 

and the joint density /(s'^. „,s^ „)• Then the experiment J^n is obtained by observing 
X* as well as conditionally on X* independent tuples (s';. „, S*^ „), = 0, . . . , m — 1. 
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Lemma 5.3. Assume that m < const. • n^^^ for some (5 > and that 

sup |Aoj(fc),n| < 2Ce , «-s. , V?? e 6 . (5.5) 

fc=0,...,m-l 

Conditionally on the data set X*, the squared Hellinger distance between f(^s'^ ^s'^. ) 
(^^d /(.fc,„,5fc,„) satisfies 

^^(fK,n'S',,jJ(sk,.,s,,„)) < const. ■ {login /m)}\m/ny , 
where const, is uniform with respect to n, X*, 'd and k. 

Remark 5.1. This approximation result together with the ensuing corollary tells 
us that we need to choose the number m of intervals of polynomially smaller order 
than n^/^. To see that we cannot hope for a better approximation order, note that 
already in the most simple univariate case where s := min([/j, i = 1, . . . , I) with Ui 
i.i.d. uniform on [0, 1] and s' exponentially distributed with intensity / G N, we have 
for / — oo 

H\fsJs')> - xy-^ - y/lexp{-Ix)Ydx 

Jo 

^ ((1 - - exp(-l/2))' X 

Corollary 5.1. We assume that an estimator d of d can he constructed from the 
data X* such that (5.5) holds. For m = O^n"^^^'^) with some 5 > as n ^ oo the 
experiments Sn and Tn are asymptotically equivalent. 



Proof of Corollary 5. 1 : Focussing on the total variation distance between the dis- 
tributions of the data (X*, {(4,n) S'k,n) : ^ = 0, . . . , m - 1}) and (X*, {{sk,n, Sk,n) ■ 
k = 0, . . . ,m — 1}) we consider for any measurable functional R on an appropriate 
domain and ||-R||oo < 1 that 

[-^^-^(X , So,n, 5'o,n, • • • , Sm-l.n, Sm-l,n) ~ E^R(X. , Sq „^, Sq ,^, . . . , „, 5'^_i^„) | 

m—l 

fc=0 

< const. ■ n~^^'^ log^ n , 



using the conditional independence of the (s^ ^, 5"^^^), = 0, . . . , m — 1, on the one 
hand and the (s'^ ^, S'^ „), = 0, . . . , m — 1, on the other hand and arguments as in the 
proof of Lemma 5J^; as well as Lemma 5^ in the last line. Thus the total variation 
distance between the distributions of the data (X*, {(s'^ „, 5'^_„) : A; = 0, . . . , m — 1}) 
and (X*, {{sk^n, Sk,n) : k = 0, . . . ,m — 1}) converges to zero as n — )■ oo, which proves 
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the claim of the corollary. 



□ 



Proof of Lemma 5^5: First we mention that, although the arguments of the Hellinger 
distance are most usually densities, its definition H'^{f,g) = f{y/J{x) — y/g{x)ydx 
may easily be extended to all nonnegative functions f,g & Li(]R). This fact will be 



used in the sequel. Moreover, note that Ik^n ^ n/m > const. ■ holds uniformly over 



k by our design assumption (2.3). We set 



fl,k,n{x,y) J ,j _ \ /{sfc,„,Sfe_„)(^) 2/) ) 
''k,n\''k,n ^) 

SO that 

H {fl,k,n, f{sk,„,Sk,„)) — 1 71 _ \/7 _ r)\2 ' i^-^) 

''k,n\''k,n ^)\''k,n ^) 

Note that the support of f{sk„,Skn) hence of /i,fe,n is included in the square 
Qk,n = [^o,j(k),n - 1, ^o,j(k),n + 1]^- A sub-squarc is defined by 

Ql,k,n = [^0,j{k),n — 1, ^0,jik),n — 1 + Ofc.n] X [^0,j{k),n + 1 — Ofc.n, ^0,j{k),n + l] ^ Qk,n , 

which will contain most probability masses, and we set Q2,k,n = Qk,n\Qi,k,n where 
«fc,n = '^o^feilog4,n with a constant do > for n sufficiently large. We split the 
Hellinger distance into integrals over disjoint domains so that 



H\fi,k,n, fisi^^,sij) < / {^/K^{x,y) - Jf^s'^^,sij{x,y)Ydxdy 

Ql,k,n 

+ 2 / fi^k,n{x,y)dxdy + 2P[4,n > ^oj(fc),n - 1 + afc,n|X*] 

•J Q2,k,n 

=: Ti + T^ + Ts + T^. (5.7) 
The conditions (2.4) and (5.5) combined with the positivity of (y? imply that ||/H/,j(fe) ||oo < 



Cs and that 

/ fw,j(k)it)dt > const. ■ {x - Aoj(fc),„ + 1) , Va; G [Aoj(fc),„ - 1, ^o,j{k),n + 1], 

/ fw,j(k){t)dt > const. ■ (Aoj(fc),„ + 1 — , Vy G [Aoj(fe),n — ^,^0,j{k),n 

As the Lebesgue measure of Qk,n is equal to 4, thus bounded, we deduce by the 
definition of fi^k,n and /(,,,„,5fc.„) that 

T2 < CiyU'" , 
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for each u > when selecting the constant do in the definition of afc_„ sufficiently 
large where denotes a finite constant which depends on neither the data X*, nor 
x,y. 

Concerning terms T3 and T4, easy calculations yield that these terms are equal 
to 2exp { - ak,nih,n - 2)fw,j(k)i^o,jik),n =F 1)}, respectively. We may use (Q, ( |5^ 
and ip > to show that fw,jik){^o,j{k),n T 1) > const. Again choosing the constant do 
sufficiently large implies that max{T3,T4} < c^n''^, for any u > with a constant 
which has the same properties as c^. 

Let us focus on the main term Ti. For (x, y) G Qi,k,n, we have 



logAfc,„(x,y) = (/fc,n - 2) f - / fw,jik){t)dt - fw,j(k){t)dt 

+ Ri,k,n{x,y) , 



where sup(^ j^)gQ^ maxfc=o,...,m-i \Ri,k,ni^,y)\ < const. ■ /fc,„a|„ x l,^^Jog^lk,n by the 
Taylor expansion of the logarithm. Furthermore, the functions to be integrated are 
locally approximated by constant functions, 

/"^0,i(fc),n + l 

fw,jik){t)dt - / fwjik)it)dt 



^Oj(fc),n~l 

= —fw,jik){'^0,jik),n — 1) ■ — Ao J(k),n + 1) 

— fw,j{k){A.oj(k),n + 1) ■ {—y + Aoj(fc),„ + 1) + R2,k,n{x, v) , 

where sup(^ ,^)gQ^^^ maxfc=o,...,m-i \R2,k,n{x,y)\ < const. • l^^Jog^ lk,n, using the Lip- 
schitz continuity of (f. 

We introduce ^^^^(a;, y) := Afc,„(x, (4,^-2)2 so that Bk,n{x, y) 

coincides with fi,k,n{x,y) on its restriction to {x,y) G Qi,k,n for n large enough, as 
well as 

B r \ f f \ fw,j{k)ix)fw,j{k)iy) 

Bk,n{x,y) := f{s'^„,s',j{x,y)- t-t rv? t^t "tt • 

We obtain 

Bl[l{x,y) = Bl[l{x,y) exp {Ri^k,n{x,y) /2 + - 2)i?2,fc,n(a;, y)/2) 
= bI'1{x, y) + B]/l{x,y)R:i,k,n{x, y) , 

where sup(^ ^jgg^^ ^ maxfc=o,...,m-i |^3,fc,n(a;, < const. ■ l^l^log^ h^n so that 



B 



Uni^^y) = fl![ Si )(^^y) + fi!i Si )ix,y)RiAn{x,y) 
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where 

|^4,fc,n(a;,l/)| < const. ■ {\fw,jik)ix) - fw,j{k){^0,j{k),n - 1)1 

+ \fw,j{k)iy) - fwj{k){^Oj{k),n + 1)1) 

< const. -afc^n X /^j^log^,^, 

where the conditions (5.5), (2.4) and their consequences have been used. We conclude 
that 

Bl[l{.x,y) = fH^ g, Jx,y) + fH^ g, Jx,y)R5,kAx,y) , 

where sup(^ ^^^gg^ maxfc=o,...,m-i \R->,k,n{^,y)\ < const. ■ l^^Jog^ lk,n- Hence, the term 
Ti is bounded from above by 

Ti < Rlk,n{^,y)f^s'^^^^s', j{x,y)dxdy < const. ■ (log^ Zfc,„)Z^_^ , 

J Ql,k,n 

as the density /(^'^ ^^5^ j integrates to one. By inserting the upper bounds on Ti, . . . , T4 
into (5.7) and combining that result with (5.6), we complete the proof. □ 



Definition 5.4. In experiment Q„ we observe the data (X*, ((ifc „, Dk.n)k=o,...,m-i) for 
■(9 G 6 where (io,n, -Do,n, • • • , dm-i,n, Dm-i,n are independent random variables, also 
independent of X*, with densities 

fdkA^) = Pk,nV{-l) exp ( - Pk,nV{-l) ■ [X - ^{^jik),n)]) lm^^^,^ J,oo){x) , 
fD,,Ax) = Pfc,„^(l) exp (pfe,„</?(l) ■ [X - ^{^j(k),n)]) l(_oo,^(5,.(fe),J](a;) , 

where pfc,„ = (ra/2) / fD(t)dt with fo as in ([2^. 



Lemma 5.4. VFe select m such that m = o(n^/^). Also we assume the existence of 



aim 



an estimator d of d based on X* such that (5.5) and 

sup sup E^\'d{x) — 'd{x)\'^ = 
i?ee xe[o,i] 

are fulfilled. Then the experiments Tn o.nd Qn are asymptotically equivalent as n ^ 
00. 



Proof of Lemma 5.4' As the estimator -(9 is based on the data set X* the trans- 
formation T which maps the observations (X*, {(s'^ ^, S*^ „) : k = 0,...,m — 1}) 
to (X*,{(4„,^^',J : k = 0,...,m - 1}) with s^^ = 4„ + ^9(e,(fc),n) + 1 and 



^k,n — ^'k.n + ^(C 



.j{k),n) 



1 is invertible. Therefore, the experiment under which 



the data (X*, {(4„, 5^'„) : = 0, 
experiment 



, m 



— 1}) are observed is equivalent to the 
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The squared Hellinger distance between the exponential densities with the same 
endpoint and the scahng parameters /ii and fi2 turns out to be 2(/ii — /i2)^(/ii + 



Also, (2.2) implies that \lk,n — Pk,n\ < 2 for all A; = 0, . . . , m — 1. We may set 

^J'l,± = pk,nf{^^) /a f{t)dt and H2,± = {h,n - 2)(f{Aoj(^k),n ± !)• Hence, 

H\fs'uJD,J + H\f,>^^J,,J < const. ■ {/,-^ + Ag,^.(,), J , 



where the constant does not depend on X*. Therein we have utilized condition (5.5) 
as well as the Lipschitz continuity, positivity and boundedness of if. We take the 
expectation of the sum of these terms over k = 0, . . . ,m — 1 which converges to zero 
uniformly in -(9 G by the assumption on m and the imposed convergence rates of 
the estimator i). Then the asymptotic equivalence is evident by the argument (5.3) 



from the proof of Lemma 5A when replacing the data sets Z and W by the data 
samples (4,n, -Dfc,„)fc=o,...,m-i and {s'l^n^ S',^ Jk=o,...,m-i, respectively, and inserting the 
conditional densities of their components given X*. The sum is, of course, to be taken 
over k = 0, . . . ,m — 1 instead of j G J„. □ 



Now we go over to experiments involving Poisson point processes (PPP). 

Definition 5.5. In experiment Hn we observe X* and independently two independent 
Poisson point processes Xi and X„ whose domain is the Borel cr-algebra of and 
whose intensity functions equal 

m—l 

Xiix,y) = m</?(l)^pfc,„l4„(x)l[_Ce-i,,?(C.(fc),J](l/), 

k=0 

m—l 

\u{x,y) = m<^(-l) ^pfc,nl/,,„(x)l[^(5^(,) j,ce+i](y), 

k=0 

and are hence locally constant. We recall that Cq is the uniform upper bound on It?! 
in the parameter set G. 

We define the extreme points of Xi and Xu in the strip Ik^n x R by 
Xi^k = inf G M : X;(4,„ x [y, oo)) = O} , 
Xu,k = sup {y G M : Xu{Ik,n x {-oo,y]) = O} . 

Lemma 5.5. (a) The statistic {X^k, Xu,k) , k = 0, ...,m — 1, is sufficient for the 
whole empirical information contained in Xi and X^. 

(b) The distribution functions of Xi^k one? X^^k ore equal to those o/max{— Ce — 
l^Dk^n] andm.m{Ce + l,dk^n}, respectively where dk^n and Dk^n (ire as in experiment 
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Qn- Moreover, allXi^k, k = 0,. . . ,m — l, on the one hand and allXu,k, k = 0,. . . ,m- 
1 on the other hand are independent. 



Proof of Lemma 5.5: (a) Let Xq denote the PPP with the intensity function Aq = 
l[o,i]x[-Ce-i,Ce+i]- The probabihty measures generated by Xq, Xi, Xu are denoted by 
Po,P;,Pu, respectively. As the functions Ao,A/,A„ are piecewise constant and the 
support of A; and A^ is included in that of Aq the measure Fq dominates and 
and the corresponding Radon-Nikodym derivatives are equal to 

f^^^) = ^-p { / ^) - / {^^^) - ^) '^y] ' 

see e.g. Theorem 1.3 in Kutoyants (1998) which apparently goes back to Brown 
(1971). Therein X may be viewed as an arbitrary counting process on the Borel 
a-algebra of [0, 1] x [-Ce - 1, Cq + 1]. We write = Ur=o' 4," x (^(^.{fc),™), Ce + 1] 
and $ = UfclTo^ ^k,n X [— Ce — 1, ^i,k] where Xi^^ equals Xi^^ except that Xi is changed 
into the general process X in the definition. Then dPi/dP^ is equal to 

— ^(X) = l|0}(r^n$)-exp|^log[pfc,„m<^(l)]X(4,„x[-Ce-l,Ce + l])} 

° fc=0 

m—l 

• exp I - ^{^{ij(k),n) + Ce + l)pfc,„¥5(l)} exp(2C0 + 2) , 

fc=0 

where we have used that X(4^„ x [-Cq - 1, Cq + 1]) = X{h,n x [-Ce - 1, ^(0(fc),n)]) 
whenever X{T^) = 0; and that and $ are disjoint if and only if X(r^) = 0. It 
follows from the Fisher-Neyman factorization theorem that the Xi^k, k = 0, . . . ,m — l 
represent a sufficient statistic for Xi. The corresponding assertion for the Xu,r is 
proved analogously. 



(b) We consider for x E [—Cq — l,'d{C,j(^k),n)] that 

P[Xl,k <X] = P[Xi{Ik,n X (X, OO)) =0] = exp ( - {l^{^j{k),n) - X)pk,n^{l)) 
= P[Dk,n < X] . 

Clearly we have P[Xi^k > ^{^m,n)] = P[Dk,n > H^m,n)] = and P[Xi^k < 
—Co — 1] = so that the distribution functions of Xi^k and max{— Ce — 1,-Dfe,n} 
coincide. The claim that ^ and min{Ce + l,dkn} are identically distributed fol- 
lows analogously. Finally the independence of the data X^k, = 0, . . . , m — 1 as well 
as of the data Xu^k, k = 0, . . . ,m — 1 follows from the fact that X{Aq), . . . ,X{Am-i) 
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are independent for all Ak C „ x [—C@ — 1, Co + 1] by the definition of the PPP. □ 



Lemma 5.6. For m = 0{n^^^), 6 > 0, the total variation distance between the 
distributions of ((minjCe + 1, dk,n}, max{— Ce — 1, Dk^n}) : k = 0, . . . ,m — Ij and 
[{dk^n, Dk n) : A; = 0, . . . , m — l) converges to zero. 



Proof of Lemma \5.6[ ' Due to the independence of the data the desired total variation 
distance is bounded from above by the sum of the total variation distances between 
the distributions of dk^n and min{Ce + l,(ifc,n} plus the corresponding distances be- 
tween the distributions of -D^.n and max{— Ce — 1, -Dfc,n} where k = 0, . . . ,m — l. The 
total variation distance between dk,n and min{Ce + 1, dk,n} is bounded by 

2P[dk^n > Ce + 1] < 2exp ( — const. ■ n/rn) , 

so that because of m < const. ■ n}~^ the sum of these terms for k = 0, ... ,m — 1 
tends to zero exponentially fast. The distributions of max{— Ce — 1, D^^n} and D^^n 
are treated in the same way. □. 

Combining these two lemmata we obtain directly asymptotic equivalence. 

Corollary 5.2. Experiments Qn and Hn are asymptotically equivalent for m as in 
Lemma \5.(A 

We observe that the choice m x n^/^-^ for some 5 E (0, 1/6) meets all requirements 
imposed on m so far and we summarize our results. 

Proposition 5.1. Select m x n^/s-^ jg^. gg^^ § (z (^q^ i^q^ suppose that there is 



an estimator-}), based on the data X* alone, which satisfies (5.5) and 



sup sup E^\d{x) - ^{x)\^ = 0{n-^-^). 
i?ee xe[o,i] 

Then we have asymptotic equivalence between experiments Cn and Tin. Moreover, if 
we have additionally 

sup sup Ei)\&'{x) -^'{x)\ = o{n-^/^-^), 
»?ee xg[o,i] 

then also An* and Tin are asymptotically equivalent. 

6. Localization of the PPP model 

The processes Xi and in the experiment Tin have step functions as their in- 
tensity boundaries which approximate continuous functions as m tends to infinity. 
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Therefore we consider now the experiment where X* and independently two PPP 
with boundary function are observed. 

Definition 6.1. In experiment X„ we observe X* and independently two independent 
PPP Xi and X2fl with intensities 

Ai,o(a;,2/) = {n/2)fe{l)fD{x)l[-Ce~i,m]{y) , 

>^2,o{x,y) = (n/2)/^(-l)/z)(a;)l[^(^.),Ce+i](2/) • (6.1) 



Proposition 6.1. We impose the conditions of Lemma \^.l\ and, in addition, that for 
all d E Q, we have 

sup |'(9'(a;)| < 2sup sup [-(^'(x)!, a.s. (6.2) 

xe[o,i] )?eea;e[o,i] 

Then the experiments Tin and X„ are asymptotically equivalent. 



Proof of Proposition \ 6.1[ First, we show asymptotic equivalence of the experiment 
Tin with the experiment Hn in which one observes the data (X*,Xi,X2) where Xi 
and X2 are PPP with the intensity functions 

Ai(x,y) = (ri/2)/,(l)/z5(x)l[_^^_^_,_^(^.)_^,(^.)(^(^,)_^.)](l/), 

\2{x,y) = (^/2)/.(-l)/D(a:)lj^(^.)_^,(^.)(^(^)_^)_^^^ij(?/), 

conditionally on X*, respectively. Here, denotes the pilot estimator from Lemma 



4.1 based on the data set X*; and we write ^(x) for the centre of that interval Ik^n 



which contains the element x. 



By a similar argument as in (4.3), it suffices to show that the expected Hellinger 
distance between the distribution of Xi and Xi on the one hand and X2 and X^ on 
the other hand converges to zero. We shall now employ a general formula bounding 
the Hellinger distance between two PPP laws Pi, P2 with respective intensities Ai, A2 
by the (generalized) Hellinger distance of the intensities ; when P denotes the law of 
the PPP with intensity A = Ai + A2, we derive from the likelihood expression 

'Ai + A2 



H\P,,P2) = 2(1 - E^exp (y ^(log(Ai/A) + log(A2/A))dX - ^ (- 
= 2(l-|E^exp(y" log v/AlA;/ArfX-y"(v/A;A;/A-l)A)} 



2A 



1 A 



•exp(-y"(v^- V^)V2 
2 ( 1 - exp v^)V2^^ (6.3) 



< / (VAi - VA2) 
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where we have used the fact that the Radon-Nikodym-derivative of the PPP-law with 
intensity A/A1A2 with respect to P integrates to one under P, see also Le Cam and 
Yang (2000) for a related result. Thus we bound the Hellinger distance between the 
intensities of Xi and Xi by 



m—l 



I I /& — J. /I 

< const. -n^ I / \^{^{x)) -^{x) +^'{x){^{x) - x)\dx 

fD{x)-m fD{y)dy rfxj , 



+ 

where the constant does not depend on X*. As fo is assumed to be Lipschitz on 
[0, 1] the latter term contributes to the asymptotic order by the deterministic upper 
bound 0(nm~^) independently of {}. Then we apply the expectation to the above 
expression and we obtain 

0(nm~^^ + const. ■ nm~^ sup sup E^\'d'{x) — 'd'{x) \ = o(l) , 

i?ee xe[o,i] 

as a uniform upper bound. Together with the same bound for the Hellinger distance, 
conditionally on X*, between the intensities of X2 and Xu this implies asymptotic 
equivalence between T-Ln and Tin again by arguments as in (5.3). 

For any two-dimensional Borel set B let us define the pointwise shifted version 

B = {{x,y)eR^ : {x,y + ^'{xMx)-x]) eB}, 

and the processes Xj{B) = Xj{B), j = 1,2, conditionally on the data set X*. Note 
that i? is a Borel set as well whenever the shift function i?'(-)[^(-) — ■] is piecewise 
continuous on the intervals Ik^n- Then Xj represents a PPP with the shifted intensity 
function 

Ai(a;,i/) = (?^/2)^(l)/D(x)l[_^^_,^^,(^)(^(^)_^)^^(^)](y) 

Mx,y) = (^/2)v'(-l)/D(a;)l[^(^),Ce+i+^'(x)(€(z)-x)](l/)- 

Note that this transformation is invertible as long as the data set X* is available. 
Therefore, the experiment T-Ln" of observing X* and Xj, j = 1,2 independently is 
equivalent to the experiment Tin- 

By the imposed upper bound on the estimator i)' we may assume that 

sup sup |'/9'(x)||^(x) — x| < 1/2, 
i9e0 xe[o,i] 

for m sufficiently large. Hence, the observation of Xj, j = 1, 2, is equivalent with the 
observation of two conditionally independent Poisson processes Xj^i and Xj^2 with 
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the intensity functions 

\i{x,y) = {n/2)ip{l)fD{x)l[^Ce-i/2,^x)]{y) , 

\,2{x,y) = {n/2)^{l)fD{x)\_Ce~i+^d'{xmx)-x),~Cs~i/2)iy) ' 

\i{x,y) = (n/2)(^(-l)/D(x)l[^(^),c;e+i/2](y) , 

X2,2ix,y) = in/2)ip{-l)fD{x)l^Ce+i/2,Ce+i+^'{xmx)-x)](y)^ 

Thus all processes Xj^i, i,j = 1, 2, are independent. Also we realize that the processes 
Xi^2 and X2,2 represent conditionally ancillary statistics given the data set X* as Ai,2 
and A2,2 do not explicitly depend on i), but are fixed by knowledge of X* for n 
sufficiently large. Therefore, the observation of X* and Xj^i, j = 1,2 is sufficient for 
complete empirical information contained in experiment Hn"- On the other hand we 
may also add two independent PPP Xj^^, j = 1,2 with the intensity functions 

\3{x,y) = (n/2)(/?(l)/fl(x)l[_Ce-l,-Ce-l/2)(l/) , 

A2,3(a;,2/) = (n/2)(^(-l)/D(a;)l(Ce+i/2,Ce+i](2/) , 

which are totally uninformative. Combining the independent processes i and 
Xj^3 whose intensity functions are supported on (almost) disjoint domains for both 
j = 1,2, the considered experiment is equivalent to the experiment X^. □ 



7. Final proof 



In this section, we combine all results derived in the previous sections in order 



to complete the proof of Theorem 2.1 For simplicity we suppose that n is even. 
By Proposition 3.1 with sample size n/2, there exists an estimator -(9 based on the 



data X = X* from experiment An which satisfies the conditions of Proposition |5.1 
e.g. by choosing 6 = a/2. Therefore, experiments An and X„ are asymptotically 



equivalent by Propositions 5.1 and 6.1 The conditions (5.5) and (6.2) are satisfied 



when truncating the range of and i}' suitably without losing validity of Proposition 



3.1 Therein, note that the uniform upper bounds on G O as well as on its derivative 
are known. Then we set An* = X„ by using the processes Xi^o and X2,o as the data set 
X* and let X take the role of the data Y' from experiment An- Note that all of our 
arguments from the previous sections remain valid when transforming the responses 



with even instead of odd observation number. Applying Propositions 5J^ and 6.1 
again, we obtain asymptotic equivalence of the experiments X„ and J'n where the 
latter model just consists of Xi q and X2fi and two independent copies X*q and X2Q. 
The likelihood process of experiment J'n and experiment Bn turns out to be the same. 
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using Theorem 1.3 in Kutoyants (1998) as in the proof of Lemma 5.5, such that Jn 
and Bn are equivalent experiments. The concrete equivalence mapping is given by 
looking at the sum of the processes Xj = Xj o + X*q, j = 1, 2, in one direction and 
by splitting the point masses in Xj randomly and independently with probability one 
half into point masses for q and X*q {thinning of a PPP) for the other equivalence 
direction. 

8. Discussion 

8.1. General remarks. We have shown asymptotic equivalence of nonparametric 
regression with non-regular additive errors and the observation of two specific inde- 
pendent PPP. Our result also yields that those nonparametric regression models are 
asymptotically equivalent to each other as long as the corresponding error densities 
have the same jump sizes at —1 and +1 and are Lipschitz continuous and positive 
within the interval (—1, 1) - regardless of the specific shape of the density inside its 
support. This unifies the asymptotic theory for these experiments and properties 
such as asymptotic minimax bounds, adaptation, superefficiency can be studied si- 
multaneously for those models. At least after suitable linear correction by a pilot 
estimator, local minima and maxima are asymptotically sufficient for inference in 
these models. 

The limiting Poisson point process model Bn exhibits a fascinating new geometric 



structure. According to (6.3), the squared Hellinger distances between observations 



with parameters ■&i,'&2 & Q is given by 

H\P^,,P^,) = 2(1 - exp (- ^(/,(-l) + /,(+l)) j \^^{x)-^2{x)\fD{x)dx 

Setting IIs'IIli^ := j\g{x)\fD{x) dx, the squared Hellinger distance is thus equivalent 
to an L^-distance 



H'iP^,,P,,) X n{/,(-l) + /.(+l)}||^i - Ml],- (8.1) 



In contrast, for nonparametric regression with regular errors the continuous limit 
model is a Gaussian shift where the corresponding squared Hellinger distance is 
equivalent to na~'^\\'di — "(9211^2 with o"^ = Var(ej^„). While it is well known that 
the standard parametric rate improves from n~^/^ to n~^, the nonparametric view 
reveals that we face here an L^-topology instead of the usual Hilbert space L^- 
structure. As discussed below, this different Banach space geometry is even visible at 
the level of minimax rates, which are in general worse than for regular nonparametric 
regression with sample size n^. A boundary behaviour of the error density other 
than finite jumps will imply a different Hellinger topology, in particular the whole 
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range of L^-geometries, p G (0, oo), might arise, whose statistical consequences will 
be far-reaching and remain to be explored in detail. 

8.2. A nonparametric lower bound. Let us apply the asymptotic equivalence 
result to study nonparametric lower bounds for all models in An and for simul- 
taneously. We content ourselves here with rate results, but we track explicitly the 
dependence on the total jump size J := fei—l) + /e(l) and the design density ff). 

Proposition 8.1. In the PPP model Bn, hut with d from the parameter space 

Qs,L ■■= eC'i[o,i])\M,< L}, s,L>0 
with generalized Holder norm 



I II II {k)\\ , l^(^) -9{y)\ 

\g\\s '■= max -'lloo + sup , 

k=o,i,...,is\ \x-y\ 



\s-\_s\ 



the following lower hound for the pointwise loss in estimating d and its derivatives at 
Xo G [0, 1] holds uniformly in J := /e(— 1) + feiX), and foixo) 

hminfinf sup PM'\xo) - ^^'\xo)\ > Co ^^t^t^^t^t^ ) > — ^ > 



with Co > 0, where the infimum is taken over all estimators in Bn and A; = 0, 1, . . . , [sj . 

By asymptotic equivalence and the boundedness of the involved loss function 
l{\d^n\xo) - ^^''\xo)\ > cL('=+^)/("+i)(nJ/^(xo))-('-^)/('+^)}, this resuh immediately 
generalizes to the regression experiments An provided the regularity s is larger than 
two. Moreover, by Markov's inequality it also applies to p-th moment risk. We thus 
have: 

Corollary 8.1. For estimators dn in experiment An with d G Qs,l C O and s > 2, 
L > we have for all p > 0, k = 0,1, . . . , \_s\ the lower hound 

liminf L-(^+^)/(^+i)(nJ/i,(xo))(^-'^)/(^+')inf sup (E^\^^J^\xo) - ^^''Kxo)^?^'' > Ci 
for some constant Ci > 0. 



Proof of the Proposition 8. 1 Let us fix A; G {0,1,... [sj}. By Theorem 2.2(ii) in 
Tsybakov (2009) it suffices to find 'di,'d2 £ ©s.l with 

and Hellinger distance of the corresponding observation laws satisfying H{Pi)^, P^.^) < 
1. 

We choose some kernel function K G e,,i with j\K{x)dx = 1, K^''\0) > 
and support in [—1/2,1/2] and we set {}i{x) = 0, 'd2{x) = Lh^K{{x — xo)/h) with 
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h = {LnJf£,{xQ))~^/^^^^^ (using one-sided kernel versions near the boundary). Then 
for n sufficiently large we have 'di.,'d2 ^ ©s.l and moreover by (8.1) 

H\P^,,P^,) = {l + o{l))nJ j \§^{x)-d2{x)\fD{x)dx 

and the integral satisfies ^\\'&2{.x)\fD{x) dx = (L + o(l))/i'^+^/£)(xo) as /i — )■ 0. We 
conclude that H{P^-^, P^^) converges to one for n — t- oo. The result therefore follows 
from 

□ 

The rate L(k+i)/{s+i)^-is-k)/is+i) -^^^^^^ ^{k+i/2)/is+i/2)^-is-k)/{s+i/2) 

ular nonparametric regression is obviously due to the L^-bound on ■}}2 instead of the 
squared L|^-bound. Let us mention that a careful study of our upper bound proof in 



Proposition |3 . 1| will also yield the same dependence on L = Cq for regularity s = 2+a 
and k G {0, 1}. More geometrically, we can establish a lower bound for estimating 
a linear functional L{i)) by maximising L{i}) over i} G Qs,l with H'*?!!/,!^ < l/{nJ). 
In the scale of Besov spaces B^^ with norms ||-||q:,p, a G M, 1 < p < oo, we have 
II^IUi > ||'*9||-i,oo by duality from H-i^Hioo < Here, we can therefore expect to 

maximise ij(i?) = '&^''\xq) as far as the interpolation inequality 

< ll^ll-TS^^'^'^ll^lli'o^'^^^'"''^ < const.(nJ)-(^-^-)/(^+i)L('=+i)/(^+i) 



k,oo 

permits. This is in fact achieved by the choice of ■&2 above, involving also the lo- 
calized value fnixo)- In the corresponding regular nonparametric regression model 
the Hellinger constraint is given by Ht^H^a < cr'^/n and we use > H"*?!! -1/2,00 by 

duality from < ||'*?||i/2,i to obtain the interpolation inequality 

Wkoo < ||^||iT/2l^i'^'^'VlliS'/'^/^'+'/'^ < const.(a-2n)-(^-'=)/(2.+i)^(fc+i/2)/(s+i/2)^ 

which similarly reveals the minimax rate in the regular case. Very roughly, we might 
therefore say that the PPP noise induces a regularity —1 in the Holder scale, while 
the Gaussian white noise leads to the higher regularity —1/2. In analogy with a j ^fn 
in the regular case we might call l/{nJ) the noise level for the regression problem 
with irregular noise and nJfj:,{xQ) the effective local sample size at Xq. 

8.3. One-sided frontier estimation. In many of the applications mentioned in 
the introduction, the noise density fs has just one jump and not two as in our model 
An- We want to stress that our proof of asymptotic equivalence can also cover the 
one-jump case. To make the analogy clear, let us assume that is still a density on 
[—1, 1] with fs;{—l) > and /e(l) = 0. Instead of positivity and Lipschitz continuity. 
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we now require to be Lipschitz continuous and Hellinger differentiable on [—1,1], 
i.e. \fYe is weakly differentiable with derivative in 1, 1]). Note that can then 

be extended to a function on the real line with the same local properties. All other 
properties of the model An are kept the same. 

For the pilot estimator in this model we can obtain the same convergence rates 
when we select that admissible local polynomial which is the smallest at Xq. Lemma 
|4.1| remains the same, while in Definition |5.1| of experiment T>n we adjust only the 
left boundary of the density and set 

fw,j{x) = ^ix)[ ip{t)dt] l[Ao,„-i,oo)(a;) , j e J„ . 



Lemma 5A then remains true as well, using the Hellinger differentiability in the proof 
instead of the uniform positivity. From the form of the density of W we conclude 
this time that the local minima Sk,n = niin{iyj^„ : Xj^n ^ Ik,n}, k = 0, . . . ,m — 1, 
are conditionally sufficient. Then the remaining results remain all valid if we just 
consider Sk^n instead of {sk^n, Sk^n) and merely the upper PPP model. Consequently, 
this establishes asymptotic equivalence with the PPP X2 of experiment In this 
PPP model the regression function appears as the lower frontier of a Poisson point 
process with intensity 1) on its epigraph. Frontier estimation where the 

support of fe is on [—1, 00) or (—00, 1], respectively, can be treated analogously. In 
a general model the case of a regular density with finitely many jumps at known 
locations might be treated, which should also be asymptotically equivalent to suitable 
PPP models. 



8.4. Counterexample for regularity one. We give a short argument that for 
equidistant design xj^n = ^Ei and parameter classes 9 where the target function 
■(9 G 6 is required to satisfy < C for some C > the experiments An and Bn 
are not asymptotically equivalent. Whether Holder classes of order 1 + a instead 
of 2 + a suffice as parameter sets for establishing asymptotic equivalence remains a 
challenging open question. 

Let us consider the function /„(a;) = C(7r(n — 1))"-^ sin(7r(n — l)x) so that ||/4||oo = 
C holds for all n > 1. Now observe that /„ satisfies fn{xj,n) = for all j = 1, . . . , n. 
This means in particular that in the regression experiment An the observations with 
regression function /„ cannot be distinguished from those with zero regression func- 
tion. In experiment Bn, however, a test between Hq : -t} = and Hi : d = fn oi the 
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form r„ = l{Xi{[Q, 1] x M+) > or X2{[Q, 1] x M") > 0} satisfies Po{Tn = 0) = 1 and 

Pf„{Tn = 1) = 1 - exp ( - \fn{x)\dx^ = 1 - exp(-2C7r-2n(n - l)"^) 

1 - exp(-2C/7r2) > , 

for n — > oo. Consequently, testing between Hq and Hi in experiment B„ is possible 
with non-trivial power uniformly over n. This implies that experiments An and B„ 
are asymptotically non-equivalent. 

References 

Brown, L.D. and Low, M. (1996). Asymptotic equivalence of nonparametric regression and white 
noise. Ann. Statist. 24, 2384-2398. 

Brown, L., Cai, T., Low, M. and Zhang, C.-H. (2002). Asymptotic equivalence theory for nonpara- 
metric regression with random design. Ann. Statist. 30, 688-707. 

Brown, L., Cai, T., Zhou, H.H. (2010). Nonparametric regression in exponential families. Ann. 
Statist. 38, 2005-2046. 

Brown, M. (1971). Discrimination of Poisson processes. Ann. Math. Statist. 42, 773-776. 

Carter, A. (2007). Asymptotic approximation of nonparametric regression experiments with un- 
known variances. Ann. Statist. 35, 1644 -1673. 

Carter, A. (2009). Asymptotically sufficient statistics in nonparametric regression experiments with 
correlated noise. J .Prob. Statist. 2009, ID 275308 (19 pages). 

Chernozhukov, V. and Hong, H. (2004). Likelihood estimation and inference in a class of nonregular 
econometric models. Econometrica 72, 1445-1480. 

DeVorc. R.A. and Lorentz, G.G. (1993). Constructive Approximation, Grundlehren Series 303, 
Springer, Berlin. 

Gijbels, 1., Mammen, E., Park, B. and Simar, L. (1999). On estimation of monotone and concave 

frontier functions, J. Amer. Statist. Assoc. 94, 220-228. 
Grama, L and Nussbaum, M. (1998). Asymptotic equivalence for nonparametric generalized linear 

models. Prob. Th. Rel. Fields 111, 167-214. 
Grama, I. and Nussbaum, M. (2002). Asymptotic equivalence for nonparametric regression. Math. 

Meth. Stat. 11(1), 1-36. 

Hall, P. and van Keilegom, I. (2009). Nonparametric "regression" when errors are positioned at 

end-points. Bernoulli 15, 614-633. 
Janssen, A. and Marohn, D.M. (1994). On statistical information of extreme order statistics, local 

extreme value alternatives and Poisson point processes. J. Multivar. Anal. 48, 1 30. 
Karr, A.F. (1991). Point Processes and Their Statistical Inference, 2nd ed., Marcel Dekker, New 
York. 

Knight, K. (2001). Limiting Distributions of Linear Programming Estimators. Extremes 4, 87-103. 

Korostelev, A. P. and Tsybakov, A.B. (1993). Minimax Theory of Image Reconstruction, Lecture 
Notes in Statistics 82, Springer, New York. 

Kutoyants, Y.A. (1998). Statistical Inference for Spatial Poisson Processes, Lecture Notes in Statis- 
tics 134, Springer, New York. 



30 



Le Cam, L.M. (1964). SufRciency and approximate sufBcicncy. Ann. Math. Statist. 35, 1419 1455. 
Le Cam, L.M. and Yang, G.L. (2000), Asymptotics in Statistics, Some Basic Concepts, 2nd ed., 
Springer. 

Miiller, U.U. and Wefelmeyer, W. (2010). Estimation in nonparametric regression with nonregular 

errors. Comm. Statist. Theo. Meth. 39, 1619-1629. 
Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. 

Statist. 24, 2399 2430. 

Reii3, M. (2008). Asymptotic equivalence for nonparametric regression with multivariate and random 

design. Ann. Statist. 36, 1957-1982. 
Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation, Springer Series in Statistics, 
van de Geer, S.A. (2006). Empirical Processes in M-Estimation, Reprint, Cambridge University 

Press, New York. 



